lectures.alex.balgavy.eu

Lecture notes from university.
git clone git://git.alex.balgavy.eu/lectures.alex.balgavy.eu.git
Log | Files | Refs | Submodules

index.md (2510B)


      1 +++
      2 title = 'Shared-memory multiprocessors'
      3 +++
      4 # Shared-memory multiprocessors
      5 multiprocessor system has a lot of processors that can do different tasks at the same time
      6 
      7 in a shared-memory multiprocessor, all processors have access to the same memory (probably large)
      8 
      9 memory is distributed across multiple modules, connected by an interconnection network
     10 
     11 when memory is physically separate processors, all requests go through a network, introducing latency
     12 
     13 if you have the same latency for memory access from all processors, you have a Uniform Memory Access (UMA) multiprocessor (but latency doesn’t magically go away)
     14 
     15 to improve performance, put a memory module next to each processor
     16 leads to collection of “nodes”, each with a processor and memory module
     17 
     18 each node is connected to network. no network latency when memory request is local, but if remote, it has to go through the network
     19 
     20 these are Non-Uniform Memory Access ([NUMA](https://youtu.be/jRx5PrAlUdY?t=1m39s)) processors
     21 
     22 ![screenshot.png](screenshot-25.png)
     23 
     24 ## Interconnection networks
     25 suitability is judged in terms of:
     26 
     27 - bandwidth — capacity of a transmission link to transfer data (bits or bytes per second)
     28 - effective throughput — actual rate of data transfer
     29 - packets — form of data (fixed length and specified format, ideally handled in one clock cycle)
     30 
     31 types commonly used:
     32 
     33 - buses — set of wires that provide a single shared path for info transfer
     34     - suitable for small number of processors (low contention)
     35     - does not allow new request until the response for the current request is provided
     36     - alternative is split-transaction bus, where request and response can have other events in between them
     37 - ring — point-to-point connections between nodes
     38     - low-latency option 1: bidirectional ring
     39         - halves latency, doubles bandwidth
     40         - increases complexity
     41     - low-latency option 2: hierarchy of rings
     42         - upper-level ring connects lower-level rings
     43         - average latency is reduced
     44         - upper-level ring may become a bottleneck if low-level rings communicate frequently
     45 - crossbar — direct link between any pair of units
     46     - used in UMA multiprocessors to connect processors to memory modules
     47     - enables many simultaneous transfers, if one destination doesn’t get multiple requests
     48 - mesh — like a net over all nodes
     49     - each node connects to its horizontal and vertical neighbours
     50     - wraparound connections can be introduced at edges — “torus”